110 research outputs found
LIGHT: Joint Individual Building Extraction and Height Estimation from Satellite Images through a Unified Multitask Learning Network
Building extraction and height estimation are two important basic tasks in
remote sensing image interpretation, which are widely used in urban planning,
real-world 3D construction, and other fields. Most of the existing research
regards the two tasks as independent studies. Therefore the height information
cannot be fully used to improve the accuracy of building extraction and vice
versa. In this work, we combine the individuaL buIlding extraction and heiGHt
estimation through a unified multiTask learning network (LIGHT) for the first
time, which simultaneously outputs a height map, bounding boxes, and a
segmentation mask map of buildings. Specifically, LIGHT consists of an instance
segmentation branch and a height estimation branch. In particular, so as to
effectively unify multi-scale feature branches and alleviate feature spans
between branches, we propose a Gated Cross Task Interaction (GCTI) module that
can efficiently perform feature interaction between branches. Experiments on
the DFC2023 dataset show that our LIGHT can achieve superior performance, and
our GCTI module with ResNet101 as the backbone can significantly improve the
performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
We consider the problem of eliciting compositional generalization
capabilities in large language models (LLMs) with a novel type of prompting
strategy. Compositional generalization empowers the LLMs to solve problems that
are harder than the ones they have seen (i.e., easy-to-hard generalization),
which is a critical reasoning capability of human-like intelligence. However,
even the current state-of-the-art LLMs still struggle with this form of
reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting,
which instructs LLMs how to compose basic skills to resolve more complex
problems. We find that it is crucial to demonstrate both the skills and the
compositional examples within the same prompting context. With as few as two
examplars, our SKiC prompting initiates strong synergies between skills and
their composition capabilities. Notably, it empowers LLMs to solve unseen
problems that require innovative skill compositions, achieving near-perfect
generalization on a broad range of challenging compositionality tasks.
Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling
them to leverage pre-existing internal skills acquired during earlier
pre-training stages, even when these skills are not explicitly presented in the
prompting context. This results in the capability of LLMs to solve unseen
complex problems by activating and composing internal competencies. With such
prominent features, SKiC prompting is able to achieve state-of-the-art
performance on challenging mathematical reasoning benchmarks (e.g., MATH)
PIVOINE: Instruction Tuning for Open-world Information Extraction
We consider the problem of Open-world Information Extraction (Open-world IE),
which extracts comprehensive entity profiles from unstructured texts. Different
from the conventional closed-world setting of Information Extraction (IE),
Open-world IE considers a more general situation where entities and relations
could be beyond a predefined ontology. More importantly, we seek to develop a
large language model (LLM) that is able to perform Open-world IE to extract
desirable entity profiles characterized by (possibly fine-grained) natural
language instructions. We achieve this by finetuning LLMs using instruction
tuning. In particular, we construct INSTRUCTOPENWIKI, a substantial instruction
tuning dataset for Open-world IE enriched with a comprehensive corpus,
extensive annotations, and diverse instructions. We finetune the pretrained
BLOOM models on INSTRUCTOPENWIKI and obtain PIVOINE, an LLM for Open-world IE
with strong instruction-following capabilities. Our experiments demonstrate
that PIVOINE significantly outperforms traditional closed-world methods and
other LLM baselines, displaying impressive generalization capabilities on both
unseen instructions and out-of-ontology cases. Consequently, PIVOINE emerges as
a promising solution to tackle the open-world challenge in IE effectively
Residual Shuffling Convolutional Neural Networks for Deep Semantic Image Segmentation Using Multi-Modal Data
In this paper, we address the deep semantic segmentation of aerial imagery based on multi-modal data. Given multi-modal data composed of true orthophotos and the corresponding Digital Surface Models (DSMs), we extract a variety of hand-crafted radiometric and geometric features which are provided separately and in different combinations as input to a modern deep learning framework. The latter is represented by a Residual Shuffling Convolutional Neural Network (RSCNN) combining the characteristics of a Residual Network with the advantages of atrous convolution and a shuffling operator to achieve a dense semantic labeling. Via performance evaluation on a benchmark dataset, we analyze the value of different feature sets for the semantic segmentation task. The derived results reveal that the use of radiometric features yields better classification results than the use of geometric features for the considered dataset. Furthermore, the consideration of data on both modalities leads to an improvement of the classification results. However, the derived results also indicate that the use of all defined features is less favorable than the use of selected features. Consequently, data representations derived via feature extraction and feature selection techniques still provide a gain if used as the basis for deep semantic segmentation
DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation
Onboard intelligent processing is widely applied in emergency tasks in the
field of remote sensing. However, it is predominantly confined to an individual
platform with a limited observation range as well as susceptibility to
interference, resulting in limited accuracy. Considering the current state of
multi-platform collaborative observation, this article innovatively presents a
distributed collaborative perception network called DCP-Net. Firstly, the
proposed DCP-Net helps members to enhance perception performance by integrating
features from other platforms. Secondly, a self-mutual information match module
is proposed to identify collaboration opportunities and select suitable
partners, prioritizing critical collaborative features and reducing redundant
transmission cost. Thirdly, a related feature fusion module is designed to
address the misalignment between local and collaborative features, improving
the quality of fused features for the downstream task. We conduct extensive
experiments and visualization analyses using three semantic segmentation
datasets, including Potsdam, iSAID and DFC23. The results demonstrate that
DCP-Net outperforms the existing methods comprehensively, improving mIoU by
2.61%~16.89% at the highest collaboration efficiency, which promotes the
performance to a state-of-the-art level
Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Semantic segmentation of point clouds generates comprehensive understanding
of scenes through densely predicting the category for each point. Due to the
unicity of receptive field, semantic segmentation of point clouds remains
challenging for the expression of multi-receptive field features, which brings
about the misclassification of instances with similar spatial structures. In
this paper, we propose a graph convolutional network DGFA-Net rooted in dilated
graph feature aggregation (DGFA), guided by multi-basis aggregation loss
(MALoss) calculated through Pyramid Decoders. To configure multi-receptive
field features, DGFA which takes the proposed dilated graph convolution
(DGConv) as its basic building block, is designed to aggregate multi-scale
feature representation by capturing dilated graphs with various receptive
regions. By simultaneously considering penalizing the receptive field
information with point sets of different resolutions as calculation bases, we
introduce Pyramid Decoders driven by MALoss for the diversity of receptive
field bases. Combining these two aspects, DGFA-Net significantly improves the
segmentation performance of instances with similar spatial structures.
Experiments on S3DIS, ShapeNetPart and Toronto-3D show that DGFA-Net
outperforms the baseline approach, achieving a new state-of-the-art
segmentation performance.Comment: accepted to AAAI Workshop 202
Joint Parsing and Generation for Abstractive Summarization
Sentences produced by abstractive summarization systems can be ungrammatical
and fail to preserve the original meanings, despite being locally fluent. In
this paper we propose to remedy this problem by jointly generating a sentence
and its syntactic dependency parse while performing abstraction. If generating
a word can introduce an erroneous relation to the summary, the behavior must be
discouraged. The proposed method thus holds promise for producing grammatical
sentences and encouraging the summary to stay true-to-original. Our
contributions of this work are twofold. First, we present a novel neural
architecture for abstractive summarization that combines a sequential decoder
with a tree-based decoder in a synchronized manner to generate a summary
sentence and its syntactic parse. Secondly, we describe a novel human
evaluation protocol to assess if, and to what extent, a summary remains true to
its original meanings. We evaluate our method on a number of summarization
datasets and demonstrate competitive results against strong baselines.Comment: AAAI 2020 (Main Technical Track
- …